13:14
2026-05-23
dev.to
large-language-models
Multi-Head Latent Attention (MLA)
**Summary:** Multi-Head Latent Attention (MLA) is an attention mechanism used in DeepSeek-V2/V3 and Kimi K2.x models that compresses the Key-Value (KV) cache by projecting full KV pairs into a shared,โฆ